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ABSTRACT: 



The FKG theorem says that the POSITIVE LATTICE CONDITION, an easily checkable hypothesis 
which holds for many natural families of events, implies POSITIVE ASSOCIATION, a very useful 
property. Thus there is a natural and useful theory of positively dependent events. There is, as yet, no 
corresponding theory of negatively dependent events. There is, however, a need for such a theory. This 
paper, unfortunately, contains no substantial theorems. Its purpose is to present examples that motivate 
a need for such a theory, give plausibility arguments for the existence of such a theory, outline a few 
possible directions such a theory might take, and state a number of specific conjectures which pertain 
to the examples and to a wish list of theorems. 
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Philosophy: 

The questions in this paper are motivated by several independent problems in combinatorial proba- 
bility stochastic processes and statistical mechanics. For each of these problems, it seems that progress 
will require (and engender) better understanding of what it means for a collection of random variables 
to be "repelling" or mutually negatively dependent. The temptation is to try to copy the theory of 
positively dependent random variables, since the FKG theorem and its offshoots give this theory a pow- 
erful footing from which to prove correlation inequalities, limit theorems and so on. Perhaps it is folly: 
no definition of mutual negative dependence has proved one tenth as useful as the lattice condition for 
positively dependent variables. The purpose of this paper is to lay the groundwork for whatever progress 
is possible in this area. The main goal is to state some conjectured implications which would bridge the 
gap between easily verifiable conditions and useful conclusions. A second purpose is to collect together 
examples and counterexamples that will be useful in forming hypotheses, and a third is to update previ- 
ous surveys by collecting the relevant known results and adding a few more. The scope of this paper is 
limited to binary- valued random variables, in the hope that eliminating the metric and order properties 
of the real numbers in favor of the two point set {0, 1} will better reveal what is essential to the questions 
at hand. 

1 Statement of the problem and some motivation 
1.1 Definition of positive and negative association 

Let B n be the Boolean lattice containing 2" elements, each element being thought of as a sequence of 
zeros and ones of length n, or as function from {1, . . . , n} to {0, 1}, or as a subset of {1, . . . , n}. Let 
/i be a nonnegative function on the lattice with X^xes M 2 -) = 1- Then /i is a probability measure on 
B n and each coordinate function is a binary random variable, denoted Xj, j = 1, . . . , n. Sometimes we 
replace the base set {1, . . . ,n} by a different index set arising naturally in an application, such as the 
set of edges of a graph. 

In order to make an analogy we review the facts about positive dependence. The measure [i is said 
to be positively associated (c.f. Esary, Proschan and Walkup (1967)) if 
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for every pair of increasing functions / and g on B n . This is a strong correlation inequality from which 
many others may be derived, and from which distributional limit theorems also follow; see Newman 
(1980). Positive association is implied by the following local (and therefore often more checkable) positive 
lattice condition (Fortuin, Kastelyn and Ginibre (1971); see also Ahlswede and Daykin (1979) for a more 
general proof): 

Theorem 1.1 (FKG) If the following condition holds then /j, is positively associated. 



In fact, one only needs to check this in the case where x and y each cover x Ay (an element u covers 
an element v if u > v and if u > w > v implies w G {u, v}). This immediately allows verification 
of positive association for basic examples such as the ferromagnetic Ising model, certain urn models, 
and, in the continuous case, multivariate normals, gammas, and many more distributions. Furthermore, 
the class of measures satisfying the lattice condition (2) is easily seen to be closed under Cartesian 
products, pointwise products, and, most importantly, under integrating out any of the variables (i.e., 
any projection of fi onto the space {0, 1} E for E C {1, . . . , n} will also satisfy (2)). 

Negative dependence, by contrast, is not nearly as robust. First, since a random variable is always 
positively correlated with itself, one cannot expect all monotone functions to be negatively correlated. 
The usual definition of negative association of a measure /x (c.f. Joag-Dev and Proschan (1983)) is that 



for increasing functions / and g, provided that / depends only on a subset A of the n variables and g 
depends only on a subset disjoint from A. Secondly, whereas in the positive case one may have EXiXj 



prevents the typical term CovXiXj from having a significantly negative value. Thirdly, the negative 
lattice condition, namely (2) with the inequality reversed, is not closed under projections. Thus one 
cannot expect it to imply negative association and indeed it does not. 

Contrasting the definitions of positive and negative association shows that the inequality (1) comes 
from two sources. The first is from autocorrelation when / and g depend on the same variable in the 
same direction; thus for independent random variables, strict inequality in (1) occurs if / and g both 
depend on a common variable. The second is from positive interdependence of the variables which 
contributes even when / and g depend on disjoint subsets. This leads immediately to a question on 
positive association which, while not directly pertaining to the subject of negative dependence, might 
shed light on how to disentangle inter- and auto-correlation. 



fi(x V y)n(x Ay) > n{x)n{y). 
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Question 1 If one assumes (1) only for f and g depending on disjoint subests of the variables, does the 
inequality follow for all increasing f and g ? 

This elementary question has not, as far as I know, been posed or answered in print. 

The reverse- inequality analogue of (1) for product measures is the van den Berg-Kesten-Reimer 
inequality: 

(j,(AUB) < n{A)p{B) (4) 

Here AOB is the event that A and B happen for "disjoint reasons": u G AUB if there are disjoint 
subsets S(lo) and T(w) of {1, . . . , n} such that A contains the set of all configurations agreeing with u> 
on S and B contains the set of all configurations agreeing with lo on T. This leads to a different but 
also somewhat natural definition of negative association, denoted here BKRNA (Berg-Kesten-Reimer 
negative association): a measure fi has the BKRNA property if (4) holds for all holds for all sets A and 
B. 

BKRNA has some claim to being "the negative version" of positive association, since instead of 
reversing the inequality in (1) and then restricting / and g, we choose a different inequality to reverse 
which holds in the independent case for all / and g. The BKRNA property has been discussed in the 
literature, but has not been fruitful. This may be due to the fact that even in the independent case, 
where the proof of (1) has been known for 40 years (see Harris 1960), the inequality (4) turned out to 
be quite hard to prove. A proof when A and B are both up-sets (see definition next paragraph) was 
given in van den Berg and Kesten (1985), generalized to the case where A and B had the next level of 
complexity (up-set intersect down-set) by van den Berg and Fiebig (1987), and then proved in complete 
generality by Rcimcr in a manuscript yet to be published. In view of this difficulty, it seems unlikely that 
proving (4) for some interesting non-product measure /i will be possible, let alone be the easiest way to 
establish a desired property of \i. Consequently, the remainder of the paper deals with classical negative 
association, where we restrict the test functions / and g instead of changing the binary set operation. 

1.2 Stochastic increase and decrease 

The notions of stochastic domination and stochastic increase and decrease are useful when defining 
positive and negative dependence properties, so we review them here. Let /i and v be measures on a 
partially ordered set, 5*. An event A C 5* is said to be upwardly closed (or an up-set) if x 6 A and y > x 
implies y e A. Often S = B n , the Boolean lattice of rank n, in which case this is the same as A being 
an increasing function of the coordinates. We say that fi stochastically dominates v (written fj, y v) if 
fi(A) > v(A) for every upwardly closed event A. The condition ^ y \i 2 h • ■ • is well known to be 
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equivalent to the existence of a random sequence (Xi, . . . , X n ) such that Xj =\ij for each j and Xj > X k 
for 1 < j < k < n (see e.g., Fill and Michuda 1998). We say that the random variable X is stochastically 
increasing in the random variable Y if the conditional distribution of X given Y = y\ stochastically 
dominates the conditional distribution of X given Y = yi whenever y\ > y 2 . The notation X \ Y 
will denote this relation, which is not in general symmetric. Similarly, X is stochastically decreasing 
in Y (denoted X J. Y) if one has (X \ Y = yi) < (X \ Y = y 2 ) whenever y\ > y 2 . A convention in 
use throughout this paper is that terms involving inequalities are meant in the weak sense, so that for 
example "decreasing" means non-increasing and "positively correlated" means non-negatively correlated. 

The relation X f Y is not in general symmetric, but implies Y | X is a certain case, as given in the 
following proposition. 

Proposition 1.2 Let X be a {0, \}-valued random variable and Y take values in any totally ordered 
set. IfX]Y thenY | X. 

Proof: Choose t in the range of Y. Since P(X = 1 1 Y) is inceasing in Y, it follows that 

P(X = 1 \Y < t) < supP(X = l\Y = s)< inf P(X = 1 1 Y = s) < P(X = 1\Y > t). 

s<t s>t 

Thus X and l Y>t are positively correlated and P(Y > t \ X = 1) > P(Y > t \ X = 0). This holding for 
all t is equivalent to F f 1. □ 

A counterexample to the converse is given by the following probabilities, where the (-i, j)-cell is the 
probability of (X, Y) = (i,j). 
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1.3 Motivating examples 

The property of negative association is reasonably useful but hard to verify. The next subsection builds 
the case for "reasonably useful" by cataloging some consequences that would hold if negative depen- 
dence could be established in some cases where it is conjectured. In the present subsection, we list 
some examples of systems which are known or believed to have the negative association property. The 
examples that are conjectured motivate us to develop techniques for proving that measures have negative 
dependence properties. The point of including examples of measures already known to be negatively 
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associated is that we can use them to study properties of negative association, which will help us refine 
our conjectures about the consequences of negative association. As seen in Section 1.5 below, knowledge 
of the characteristics of negatively associated variables will be helpful in proving criteria for negative 
association. 

1. The uniform random spanning tree. Let G be a finite connected graph, and let T be a random 
spanning tree (i.e. a maximal acyclic set of edges of G) chosen uniformly from among all spanning trees 
of G. It is easy to prove that the indicator functions {X e } of the events that e <E T have the following 
property: for any edges e and /, X e and Xf are negatively correlated. Feder and Mihail (1992) have 
shown that in fact this collection is negatively associated. As we will see later, one concrete consequence 
of this is that the conditional measures given e £ T and e ^ T may be coupled to agree except that the 
latter has precisely one more edge elsewhere. 

A natural generalization is to consider weighted spanning trees. Let W : E(G) — > 1R + be a function 
assigning positive weights to the edges of G. Define the weight W{T) of a tree T to be the product 
J7 egT W(e) of weights of edges in T. The probability measure \x on {0, 1} B ( G ) concentrated on span- 
ning trees whose weights n(T) are proportional to W(T) is called the weighted spanning tree measure. 
Everything known about the uniform spanning tree also holds for the weighted spanning tree; in fact a 
rational edge weight of r/s may be simulated in the uniform spanning tree setting by replacing the edge 
e by r parallel paths of length s each. 

2. Simple exclusion. Let G be a finite graph, let rj be a function from V(G) to {0, 1}, and let £ t be 
the trajectory of a simple exclusion process starting from £ = i] . The simple exclusion process is the 
Markov chain described as follows. For each edge e independently, at times of a rate 1 Poisson process, 
the values of r\ at the two endpoints of e are switched. This is thought of as a particle moving across 
the edge but only if the opposite site is vacant. Fix t and let X v = £t(t>) be the indicator function of the 
occupation of the vertex v at time t. It is known (Liggett 1977) that 



for any subset S of the vertices of G. Are the variables X v negatively associated? The most natural 
generalization of simple exclusion is to allow the Poisson processes on the different edges to have different 
rates; the inequality (5) is known in this generality. 

3. Random cluster model with q < 1. Let G be a finite graph. For any subset rj of the edges, 
viewed as a map rj : E{G) — ► {0, 1}, let N(rj) denote the number of connected components of the graph 
represented by rj. Given parameters p £ (0, 1) and q > 0, define a measure [i = fj, p>q on {0, 1} E by letting 
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Here C is the normalizing constant 



-l 



C* = 



E p^" (e) (i-p) E = 



l-r)(e) N( v ) 



^:E(G)-{0,1} 



When g > 1, the variables X e := 77(e) are easily seen to be positively associated by checking the positive 
lattice condition and applying the FKG Theorem. When q < 1, the negative lattice condition holds, 
but aside from this little is known about the extent of negative dependence. Negative association and 
BKRNA are both conjectured to hold, but it is not even known whether the variables X e := 77(e) are 
pairwise negatively correlated under \x. The random cluster model has the uniform spanning tree model 
as a limit as p, q and p/q go to zero (see Haggstrom 1995); thus negative association in the RC model 
would in a way generalize what is known for spanning trees. The RC model may be generalized by 
letting the factor p vary from edge to edge. Thus one has a function p : E(G) — ► (0, 1) and the term 



4. Occupation of competing urns. Let n urns have k balls dropped in them, where the locations of 
the balls are IID chosen from some distribution. Let -X", be the event that urn number i is non-empty. 
It is proved in Section 2.3 that these events are negatively associated. Dubhashi and Ranjan (1998) 
consider this example at length and show negative association of the occupation numbers of the bins 
(numbers of bals in each bin) . From this follows negative association of the indicators of exceeding any 
prescribed threshholds at in bin i. Occupation numbers of urns under various probability schemes have 
appeared many places. Instead of multinomial probabilities, one can postulate indistinguishability of 
urns or balls and arrive at Bosc-Einstcin or other statistics. Negative association seems only to arise in 
the multinomial models, where Mallows (1968) was one of the first to observe negative dependence. 

1.4 Consequences of positive and negative association 

One use that is reasonably general is that of classifying infinite volume limits of Gibbs measures. The 
prototypical example is the ferromagnetic Ising model. The ferromagnetic Ising measure on a finite 
box G with boundary B and boundary condition r\ : B — > { — 1,1} is a measure on spin configurations 
£ : G — >{—!,!} proportional to 



The spin variables {£(#) : x E G} are positively associated and stochastically increasing in {ij(y) : y £ B}, 
from which it follows that there are a stochastically greatest and least infinite volume limit, corresponding 



is replaced by the more general Y\ e p{e) r)t " e " 1 {I — p(e)) 1 v ^ ■ 
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to plus and minus boundary conditions respectively. Thus there is non-uniqueness of the Gibbs state if 
and only if the plus and minus states differ. 

Another example of this is the uniform spanning tree, which is almost Gibbsian except that some 
configurations have infinite energy (are forbidden). Let be the uniform spanning tree measure on 
the finite subcube of the d dimensional integer lattice centered at the origin with semi-diameter n. The 
A refers to a specification of boundary conditions, i.e., of a partition of the vertices of the boundary of 
the n-cube into components, so that the sample tree is uniform over all spanning forests of the cube that 
become trees if each component of A is shrunk to a point. Pcmantlc (1991) shows that the measures 
fiif converge weakly to a measure fi in the case where A n is the discrete partition, and uses electrical 
network theory to show that this same limit holds for any A n . With the negative association result 
of Feder and Mihail (1992) it is easy to see this directly as follows. Iterating the stochastic relation 
between the conditional measures given e G T and given e ^ T shows that ^ fj^ ' whenever 

A' refines A. Thus the measures /i^ 4 ' are stochastically sandwiched between the measures induced by 
"free" and "wired" boundary conditions (where A is repectively discrete or a single component); thus 
the set of limits is sandwiched between a maximal and minimal limit measure; both must have the same 
one-dimensional marginals (by stationarity) and hence must coincide. 

Negative association has the further consequence that the uniform spanning tree measure is Very 
Weak Bernoulli. Briefly, this means that the conditional measures inside a large box given two indepen- 
dent realizations of the boundary can be coupled so as to make the expected proportion of disagreements 
arbitrarily low. To see that the Uniform Spanning Tree is VWB, note that the number of edges in a 
spanning tree is determined by the boundary conditions, so that free boundary conditions will always 
yield precisely \dB\ — 1 more edges than wired boundary conditions, where dB denotes the set of ver- 
tices in the boundary of a set B. Given two boundary conditions A\ and A 2l we can construct a triple 
(Ti, T*, T2) such that T\ is chosen from the measure with boundary conditions A\, T 2 from boundary 
conditions A 2 , and T* from free boundary conditions, and so that T* contains T\ (construct (Ti, T*) from 
the coupling witnessing T\ < T* and then construct T 2 given T* from a coupling witnessing T 2 ^ T*). 
Then Ti and T 2 differ in fewer than 2\dB\ places. Question: is there a simultaneous coupling of all 
boundary conditions such that the configuration with boundary condition A is a subset of the configura- 
tion with boundary condition A' whenever A' refines Al For the reason why this does not immediately 
follow from stochastic monotonicity in the boundary conditions, see Fill and Machida (1998). 

Positive and negative association may be used to obtain information on the distribution of function- 
als such as J2 e ^-( e )- Newman (1980, 1984) shows that under either a positive or negative dependence 
assumption, of strength between cylinder dependence and full association, the joint characteristic func- 
tion of the variables {X e } is well approximated by the product of individual characteristic functions. 
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This allows him to obtain central limit theorems for stationary sequences of associated variables. In the 
positive association case one needs to assume summable covariances, whereas in the negative case one 
gets this for free. It is logical to ask what information may be obtained from negative association without 
passing to a limit. For example, since one has a CLT or triangular array theorem in the independent 
case, can one prove that negatively associated events are at least as tightly clustered as independent 
events? Section 2.4 discusses some conjectures along these lines. Here is a specific application of these 
conjectures. 

Consider simple exclusion on the one-dimensional integer lattice, with initial configuration given by 
X v = 1 for v < and X v = for v > 0. What can one say about the number N t := J2 v >o T h( v ) °^ 
occupied sites to the right of the origin at time tl The mean EN t is easy to compute, and an upper bound 
of Oit 1 / 2 ) on the variance has been obtained by several people. While this shows that (Nt — 'EN t )/t 1 ^ 4 is 
tight, it is a far cry from a limit theorem. It would be nice to be able to obtain a central limit theorem, 
or, in lieu of that, Gaussian bounds on the tails of N t . The conjectured chain of implications is: first, 
the exclusion model is negatively associated; second, negatively associated measures have sub-Gaussian 
tails. Negative assocation is known [Dubhashi and Ranjan (1998), Proposition 7] to imply the Chernoff- 
Hocffding tail bounds; see conjectures (4) and (5) below for other possible consequences of negative 
association. 



1.5 Feder and Mihail's proof 

Feder and Mihail (1992) prove that a uniform random base for a balanced matroid, of which the uniform 
spanning tree measure is a special case, has the negative association property 3 . They use induction on 
the size of the edge set E, with the specific nature of the measure entering through only two properties, 
(i) and (ii). The logical form of the proof is as follows. Choose an edge e appropriately and show that 
property (ii) holds for (/x | e). This together with property (i) for /i and the induction hypothesis then 
imply that \i is negatively associated. 

This argument provides further motivation for deriving consequences of negative association. If 
we can prove, for example, that negative association implies property (ii), then the step where we 
verify property (ii) drops out (by induction!) and the entire argument may be carried out using only 
property (i). Proving something weaker than (ii) for negatively associated measures still reduces the 
work to proving (ii) from this property. We make this all concrete by defining the properties and stating 
the above as a theorem. 

3 This is false for general matroids; see Seymour and Welsh (1975). 
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Let S be a class of measures on Boolean algebras which is closed under conditioning on some of the 
coordinate values. An example of such a measure is the uniform or weighted spanning tree measure or 
the random cluster measure. 

Property (i) pairwise negative correlation: each fj, S S makes each pair of distinct X e 
and Xf negatively correlated. 

Property (ii) some edge correlates with each up-set: for each /i £ S and increasing 
event A there is an edge e with ^(X c 1a) > n(X e )fi(A). 

Theorem 1.3 Let S be a class of measures closed under conditioning and under projection (i.e., for- 
getting some of the variables) and suppose all measures in this class have pairwise negative correlations. 
Then property (ii) for S (implied for example by Conjecture 8 below) implies that every measure in S is 
negatively associated. 

PROOF of theorem: Pick fi in S and induct on the rank n of the lattice on which \x is a measure. 
When n = 1 the statement is trivial. Now assume the conclusion for all measures in S on lattices of 
size less than n. The remainder of the proof copies the Feder-Mihail argument. For brevity, we show 
that A and B are negatively correlated when B — X e and A is an arbitrary up-set not depending on the 
variable X e . 

If P(X e = Xf = 1) = for all / ^ e the induction step is trivial, so assume not. By property (ii) 
for ([i | e) there is some f ^= e for which 

^(A\X e =X f = l)>v(A\X e = l). (7) 

Now write 

»(A \X e = l) = 

lt{X f = l\X e = \)p(A | X e = X f = 1) + n(Xf =Q\X e = l)fi(A | X e = l,X f = 0) 
H{A | X e = 0) = 

n(X t = l\X e = 0)m04 I X e = 0, X f = 1) + n(X f = 0\X e = 0)»(A \X e = 0, X f = 0) . 
Comparing terms on the right-hand sides, we see that 

(i) l-i(Xf = 1 1 X e = 1) < fJ,(Xf = 1 | X e = 0) by the assumption that measures in S have 
pairwise negative correlations; 
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(ii) fi(A | X e = Xf = 1) < n{A\X e = 0,Xf = 1) since the conditional law (/x \ Xf — 1) is 
assumed by induction to be negatively associated and hence A and X e are negatively corre- 
lated given Xf = 1; 

(m) /z(A | X e = = 0) < | X e = 0,Xf = 0) by the induction hypothesis this time 

applied to (n\Xf =0); 

(iv) n{A | X e = X f = 1) > /x(A | X e = 1, X/ = 0) by the choice of /. 

These four imply that the left-hand sides are comparable: fj,(A | X e = 1) < /j,(A \X e = 0). This completes 
the induction in the special case where one of the two upwardly closed events is a simple event, {X e = 1}. 
The case of a general upwardly closed event is similar (see the Exercise 6.10 in Lyons and Peres 1999). 
□ 

2 Properties and implications 

2.1 Obtaining measures from other measures 

Before discussing negative dependence properties of various strengths, we consider ways of obtaining a 
measure // from a given measure /x in such a way as to preserve any known or conjectured negative 
dependence properties. The reason for discussing these beforehand is to lend perspective to some of the 
definitions: if the property is not closed under the fi \— ► //, either by definition or by some argument, 
then perhaps it is not such a natural property. In the foregoing, we fix a finite set E and a probability 
measure /x on the space {0, 1} E . 

1. Projection. Given E' C E, let // be the projection of /i onto {0,1}^ . This corresponds to 
integrating out (i.e., forgetting) the variables in E\E' . Clearly any natural negative dependence property 
is closed under projection. 

2. Conditioning. Given A C E and 77 e {0, 1} A , consider the conditional distribution (fi\X e — 
77(e) for e G A). It is reasonable to expect these sections of the measure /x to be negatively dependent 
if zx is. Several of the motivating examples, namely spanning trees, RC model and the Ising model, are 
classes of measures closed under conditioning. Note that we are not allowing conditioning on a set larger 
than a single atom. To ask that the projection of /x onto {0, 1} E \ A be negatively dependent, conditioned 
on the event < X e : e e A >G S for arbitrary S is significantly stronger. 
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3. Products. If jii and jj,2 are negatively dependent, then clearly \x\ x /X2 should be. 

4. Relabeling. The measure // defined by /i'{X e = 77(e) : e e E} = ji{X e = r](ir(e)) : e G E}, where 
7r is some permutation of E, is of course just a relabeling of fi. 

5. Extends the concept of negative correlation. When \E\ = 2, any reasonable definition reduces to 
negative correlation. 

6. External field. The name for this property is borrowed from the Ising model. Let W : E — > R + 
be a non-negative weighting function and let y! be the reweighting of /x by W. Specifically, let 

fi'{X e = 77(e) : e e E} = C JJ W{e)^ e ^{X e = 77(e) : e e E} , 

where C is a normalizing constant. This corresponds to making a particular value for each edge more 
or less likely, without introducing any further interaction between the edges. For example if W(e) ^ 1 
for a unique e, then the probability of {X e = 1} is altered, but the conditional distributions of (/j,\X e ) 
are unaltered. Many of the classes of measures which motivate our study are closed under imposition 
of an external field. For spanning trees or for the RC model, this corresponds to the weighted case; for 
the Ising model it corresponds to an external field. Closure under external fields may seem far from a 
natural condition for models that are not thermodynamic ensembles, but this may be more natural than 
it seems. First, if one believes in closure under conditioning, then this is the canonical interpolation 
between conditioning on X e — 1 and conditioning on X e = 0. Secondly, Karlin and Rinott in 1980 had 
already proposed a property they call S-MRR2 which is essentially the negative lattice condition plus 
closure under projection and external fields (see the discussion preceding Conjecture 2). 

2.2 Negative dependence properties and their relations 

We recall the definition of negative association: 

Definition 2.1 {X e : e E E} are negatively associated (NA) if for every A C E and every pair of 
bounded increasing functions f : {0, 1} A -> M and g : {0, 1} E \ A -> M, Efg < EfEg. 

Unfortunately, this property is not closed under conditioning or external fields (see Example 2 below). 
This may be an indication that these two closures are not so natural after all, but on the other hand it 
makes sense, at least for closure under conditioning, to make a new definition: 
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Definition 2.2 The measure \x is conditionally negatively associated (CNA) if each measure // gotten 
from /! by conditioning on some (or none) of the values of the variables is negatively associated. 

Since the operation of conditioning is easy to understand in many of our motivating examples, this 
extension should not prove to unwieldy. 

The weakest possible negative dependence property is pairwise negative correlation: 
n(X e Xf) < n(X e )n(Xf). For real-valued random variables, there is a stronger pairwise property, called 
negative quadrant dependence (NQD) in Newman (1984), after Lehman (1966). Say that X and Y are 
NQD if 

P(X >a,Y >b)< P(X > a)P(Y > b) 

for all a and b. For binary-valued random variables, this reduces to simple correlation. A stronger 
property, called negative regression dependence (in analogy with positive regression dependence c.f. Esary, 
Proschan and Walkup 1967), is defined by requiring the conditional distribution of X given Y to be 
stochastically decreasing in Y: P(X > t\Y — s) is decreasing in s for each t. For binary-valued variables 
this again reduces to negative correlation. When X and Y are vectors, X :=< X e : e G A >,Y :=< 
x e : e £ A >, this would say that the conditional joint distribution of {X e : e G A} given {X e : e £ A} 
should be stochastically decreasing in the values conditioned on. Thus we have a definition: 

Definition 2.3 Say that the variables {X e :ee£} are jointly negative regression dependent (JNRD) if 
the vectors < X e : e G A > and < X e : e ^ A > are always negative regression dependent. Equivalently, 
require that for any increasing event H measurable with respect to {X e : e G A}, fi(H\x e : e £ A) is 
decreasing with respect to the partial order on {0,1} A . 

Unraveling the definitions, one sees that conditional negative association implies JNRD, since JNRD 
is simply CNA in the special case where one has conditioned on {X e : e G A c } \ {/} and then asks for Xf 
to be negatively correlated with 1 H for any increasing event H measurable with respect to {X e : e G A}. 

The negative lattice condition 

fj,(x Vy)ti{x Ay) < fi(x)n(y). (8) 

is closed under five of the six closure operations, but the missing one, projection, is crucial. This is what 
makes the negative version of the FKG theorem fail. Accordingly, 

Definition 2.4 Say that {X e : e G E} satisfy the hereditary negative lattice condition (h-NLC) if every 
projection satisfies the negative lattice condition. 
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It is easy to see that JNRD implies h-NLC, since h-NLC is the special case where A is a singleton. 

None of the three properties CNA, JNRD or the hereditary NLC are closed under imposition of an 
external field (see Example 1 below). Projecting from index set S to S' and then imposing an external 
field (on 5") is the same as imposing an external field which is trivial on S \ S' and then projecting to 
S'. Thus any sequence of projections and external fields may be written as one external field followed by 
one projection. One may define three stronger properties, CNA+, JNRD+ and h-NLC+, which are that 
the corresponding properties hold for the given measure and for all measures obtained from the given 
measure by imposition of an external field and a projection; these properties are then by definition closed 
under external fields and projections. While these stronger properties are difficult to check directly, they 
appear to hold for the motivating examples and are introduced in the hope that they do in fact hold 
there and are strong enough to be useful in inductive arguments such as the proof of Theorem 1.3. The 
property h-NLC+ is called S-MRR 2 by Karlin and Rinott (1980), according to terminology they develop 
mainly for continuous random variables. 

The terminology introduced thus far can be summarized with a diagram of implications. 

(S-MRR 2 ) 

CNA+ JNRD+ h-NLC+ 

CNA JNRD h-NLC 



NA 

Figure 1 



2.3 Conjectures, examples and counterexamples 

The vertical implications in Figure 1 are strict, as shown by the examples which follow in this section. 
Whether the horizontal implications are strict is an open question: 

Conjecture 2 All three properties CNA+, JNRD+ and h-NLC+ are equivalent. 
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Another immediate question is whether anything other than CNA is strong enough to imply negative 
association. 



Conjecture 3 Strong version: h-NLC implies NA. Weak version: h-NLC+ implies NA. 

Examples showing the vertical implications are not equivalences are as follows (verified by brute 
force). 

Example 1: Suppose n = 3, and the probabilities for the various possible atoms are proportional to the 
following: 



P(*l 


= o,x 2 


= 0,A 3 


= 0) 


= 16 


P(*l 


= o,x 2 


= 0,A 3 


= 1) 


= 8 


P(*l 


= o,x 2 


= 1,A 3 


= 0) 


= 8 


P(*l 


= 0,X 2 


= 1,*3 


= 1) 


= 8 


P(*l 


= i,x 2 


= 0,A 3 


= 0) 


= 12 + e 


P(X 1 


= i,x 2 


= 0,A 3 


= 1) 


= 4 


P(X 1 


= i,x 2 


= 1,A 3 


= 0) 


= 4 


P(X 1 


= i,x 2 


= 1,A 3 


= 1) 


= 1. 



When < e < .8 then this measure satisfies CNA and hence JNRD and h-NLC. However, when e > 0, 
then applying the external field (A, 1, 1) for any positive A < e/(l — e) yields a measure in which X 2 
and Xz are positively correlated, thus violating h-NLC and hence JNRD and CNA. This shows the first 
three vertical implications in Figure 1 are strict. 

Example 2: Suppose n = 3, and the probabilities for the various possible atoms are in the proportions: 

P(X 1 =Q,X 2 = 0, X 3 = 0) = 

P(X 1 =0,X 2 = 0,X 3 = 1) = 1 

P(X 1 =0,X 2 = 1,X 3 = 0) = 1 

P(X 1 =0,X 2 = l,X 3 = l) = lOe 

P(X 1 = 1,X 2 = 0,X 3 = 0) = 1 

P(X 1 = 1,X 2 = 0,X 3 = 1) = 1 

P(X 1 = 1,X 2 = 1,X 3 = 0) = lOe 

P(X 1 = 1,X 2 = 1,X 3 = 1) = e. 
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Here the negative lattice condition fails on the four atoms having X2 = 1; thus CNA, JNRD and h-NLC 
(in fact NLC) all fail, whereas the variables are in fact negatively associated. Thus the lowest vertical 
implication in Figure 1 is strict as well. 

The following lemma will be useful on a number of occasions. The easy inductive proof is omitted. 

Lemma 2.5 Let Y]_,...,Y n be random variables taking values in a partially ordered set and suppose 
they have the Markov property, namely that Y\, . . . , Yfc_i are independent from Yk+\, . . . , Y n given Y%. 
Suppose also that each Yfc+i is either stochastically increasing or decreasing in Y\~. Then Y n is either 
stochastically increasing in Y\ or stochastically decreasing in Y\, according to whether the number of 
indices k for which Yfc+i is decreasing in Yk is even or odd. □ 

We conclude this subsection with a proof that the competing urn model of Example 4 is negatively 
associated. The result with general threshholds is proved in Dubhashi and Ranjan (1998), but the proof 
given here is independent of that. 

Proof that the urn model is negatively associated: Fix 1 < r < n and let A and A' be 
up-events measurable with respect to {Xi : i < r} and {Xi : i > r} respectively. Let V and V be the 
total number of balls dropped into urns i with i < r and i > r respectively. Letting Yi be the indicator 
function of A, Y4 be the indicator function of A', Y2 = V and Y 3 = V', it is clear that Y 1 ,Y 2 ,Y 3 ,Y 4 
has the Markov property. I claim also that A is stochastically increasing in V and A' is stochastically 
increasing in V. By symmetry, consider only A and V. Observe that conditional on V = m, the draws 
are exchangeable in the usual sense (definition below), so we may condition on the first m draws being 
those that went in urns i < r. Then the distribution of balls given V — m and the distribution of balls 
given V — m + 1 may be coupled so that the latter is always the former plus an extra ball somewhere. 
This establishes the claim. It is similarly easy to show that V is stochastically decreasing in V. By 
Proposition 1.2, V is stochastically increasing in A. Then the hypothesis of the above lemma is satisfied 
with stochastic increase for k — 1 and k — 3 and stochastic decrease when k = 2; it follows that A' is 
stochastically decreasing in A which proves negative association. □ 

2.4 The exchangeable case and the rank sequence 

The variables {Xi, . . . ,X n } are said to be exchangeable if their joint distribution is invariant under 
permutation. In the case of binary-values random variables, this is the same as saying that fi{Xk = 
rj(k) : 1 < k < n} depends only on ^2 k n(k). A fair amount of intuition may be gained from this 
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special case. The conjectured equivalences in Figure 1 are proved in this case, but more importantly, 
new conjectures come to light that ought to hold in the general case as well. 

For a measure ji on B„, define the rank sequence {a^ : < k < n} by :— A*{Sj=i Xj = k}. Thus 
{at : < k < n} gives the total probabilities for the n + 1 ranks of the Boolean lattice £>„. If the 
random variables {Xj} are exchangeable, then \i is completely characterized by its rank sequence, with 
n{Xj = n(j) : 1 < j < n} = afe/(^) for k = J2j v(j)- in this case, the negative lattice condition (8) 
boils down to log-concavity of the sequence {afc/(^!)} (a positive sequence is said to be log-concave if 
«fe > afc-iflfc+i)- This motivates the following definition. 

Definition 2.6 A finite sequence {a^ : < k < n} is said to be Ultra-Log-Concave (ULC) if the nonzero 
terms of the sequence {a k / (")} form a log-concave sequence and the indices of the nonzero terms form 
an interval. 

Convention: From now on, to avoid trivialities, we have included in the definition of log-concavity that 
the indices of the nonzero terms form an interval. It will be useful later to note that log-concavity is 
conserved by convolutions and pointwise products. 

The significance of Ultra-Log-Concavity in the general case is still conjectural, but in the exchangeable 
case it is given by the following theorem whose proof appears at the end of the section. 

Theorem 2.7 Suppose that {Xj} are exchangeable. Then the six conditions CNA+, JNRD+, h-NLC+, 
CNA, JNRD and h-NLC (see Figure 1) are equivalent to Ultra- Log- Concavity of the rank sequence {ah}- 
This is trivially equivalent to the negative lattice condition, (8). 

Call the measure \x (not necessarily exchangeable) a ULC measure if its rank sequence is ULC, and 
use the term ULC+ to denote a measure such that any measure obtained from it by external fields and 
projections is ULC. The following conjectures, if true, imply a large role for the ULC property in the 
study of negative dependence. They have been checked only for lattices of rank up to 4. 

Conjecture 4 The strongest version of this conjecture is that any negatively associated measure is ULC. 
For a weaker version, replace the hypothesis of NA by any of the other six stronger conditions in Figure 1. 

Conjecture 5 In the RC model, the sum ~^2 eeS X e over any subset S has a ULC rank sequence. The 
same holds for the competing urns model. In the exclusion model, the total number of occupied sites in 
any set S at any time t has ULC rank sequence. 
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Remark: The ULC property for number of edges present from a given subset in a uniform (or weighted) 
random spanning tree is a subcase of the conjecture for the RC model. For spanning trees, this would 
sharpen a result of Stanley (1981) showing that the rank sequence for a uniform random base of a 
unimodular matroid (of which the uniform spanning tree is a special case) is log-concave. 

Conjecture 4 or the weaker 5 would serve two purposes. Firstly, the ULC property implies tail 
estimates on a distribution. Secondly, Conjecture 4 would imply that that the ULC property is a 
necessary condition for negative association, which helps to narrow and define our search for the "right" 
negative dependence property. 

The fact that ULC implies CNA+ et al in the exchangeable case leads one to believe that ULC+ 
might be enough to imply negative dependence in general: 

Conjecture 6 If fi is ULC+ then fj, is CNA (hence CNA+) and in particular [i is negatively associated. 

Unlike the previous two, this conjecture is not particularly useful, since the hypothesis of ULC+ is 
hard to check. It would, however, have philosophical value: supposing there to be a useful definition of 
negative dependence still lurking out there, we have been approximating it from the weak side, finding 
criteria that certainly hold for any such definition; the foregoing conjecture strengthens our previous 
approximation by adding the property ULC+. 

A final philosophical observation belongs in this section. If Ultra-Log-Concavity is, as conjectured, 
a property of all negatively dependent measures, then the class of ULC sequences must be closed under 
convolution. Indeed, if /ii and [it are two exchangeable measures with ULC rank sequences, then by 
Theorem 2.7 they are negatively dependent in all senses we can imagine, so their product must be as 
well. The rank sequence for the product is the convolution of the rank sequences, so unless even our 
understanding of the exchangeable case is nil, the following conjecture must be true. Embarrassingly, 
in the previously circulated draft of this paper, there was no proof of the following conjecture. It has 
recently been proved by Liggett (1997). 

Conjecture 7 (Now proved by Liggett) The convolution of two ULC sequences is ULC. 

This section concludes with a proof of Theorem 2.7. Begin with the following two lemmas. 

Lemma 2.8 Let fi be an exchangeable measure with ULC rank sequence. Suppose the measure \j! is 
obtained from \i by imposing an external field at coordinates l,...k (i.e., W(j) = 1 for j > k) and 
then projecting onto coordinates r + 1, . . . ,n for some r > k. Then [i 1 is exchangeable with ULC rank 
sequence. 
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Proof: The exchangeability of //' is clear. To see that p! has ULC rank sequence, it suffices to consider 
the case r = 1. [Reason: defining fij to be the measure gotten by imposing the external field on the first 
j coordinates and projecting onto the last n — j coordinates, one sees by induction on j that [i r = \j! 
will have the desired property] . So we assume without loss of generality that k = r = 1. 

Let A denote W(l). Let a,j (respectively a'j) denote the rank sequence for /j (respectively //) and let 
qj (respectively q 1 -) denote <Xj/(") (respectively ^/("T 1 )). Then 

Qj = c (Qj + Mj+i) , 

where C is the normalizing constant for the external field. By assumption, {qj} is log-concave, and 
hence for any i < j, qiqj < q i+1 qj-i. The proof is now a simple calculation. 

= q) + 2\q j q j _ 1 + X L q]_ l - qj-iqj+i - \qj-2qj+i - Mj-iQj - ^Qj-2qj 

= tij - Qj+iQj-i] + MljQi-i - Qj+iQj-2] + A 2 - qjqj-2] ■ 

This is the sum of three positive quantities, so it is positive, proving log-concavity of {q'j} which is 
equivalent to {a'j} being ULC. □ 

Lemma 2.9 Let /j* be a measure obtained from an exchangeable measure fi' with rank sequence {a' k } 
by imposing an external field W. Let Y\ and Y4 be the respective indicator functions of A and A' , 
events measurable with respect to disjoint sets S and S' . Let Y 2 = J2ees and Y 3 = J2eeS' -^e- Then 
the sequence {Yi} is Markov. Furthermore, the conditional laws (fj,* | ^2X(e) = k) are stochastically 
increasing in k and the same holds for any projection of /1* in place of /j* . 

Proof: Let //,/J*, A, A',S, S' and {1^} be as in the hypotheses. The probabilities for /j* are given as 
follows, with C being a normalizing constant as usual: 

p*{X e = 77(e), all eeS} = C JJ W{e)^ 

e \k) 

where k = ^ e ? ?( e )- From this, one gets the conditional probability 

H*(X e = V (e) :e£S'\X e = r?(e) : e e S') = C JJ W(e)^^r. 

This docs not depend on the values of n on S' except through J2 e es' n { e )> which proves the Markov 
property. For the stochastic increase, note that the conditional distribution of /j* given ^2 e X(e) are 
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the same as the law of independent Bernoulli random variables with P(A(e) = 1) = W(e)/(1 + W(e)), 
conditioned on \^Z e X(e) = k}. The same holds for any projection of fi*. There are elementary proofs 
that these laws increase stochastically in k, but in the context of this paper, the easiest argument is 
to add an extra variable X(e*) and apply the Feder-Mihail result to the balanced matroid gotten by 
conditioning on J^l X{e) = k + 1 and to the conditional measures given X(e*) — and X(e*) = 1. □ 

Proof of Theorem 2.7: It is clear that ULC is equivalent to the negative lattice condition and hence 
is implied by h-NLC. To show that ULC implies the other six conditions we work up the ladder. First, 
if \i is exchangeable and ULC, then Lemma 2.8 shows that all projections of \i are as well, which means 
that the NLC holds hereditarily, giving h-NLC. In fact, the lemma is enough to give h-NLC+, since 
any ti* obtained from \i may be described (after re-ordering of coordinates) as some measure fi' as in 
the lemma, on which has been imposed an external field (that is, any sequence of external fields and 
projections may be written as an external field that affects only those indices not appearing in the final 
measure, followed by a single projection, followed by an external field); Lemma 2.8 implies \j! satisfies 
the negative lattice condition (8); this is invariant under external fields, so fi* satisfies (8) as well. 

Next, we show that for any measure fj,* obtained from an exchangeable measure /i by external 
fields and projections, JNRD implies CNA. This will show that JNRD+ implies CNA+ as well as 
showing JNRD implies CNA. To show this, let /z* be such a measure. Let A and A' be any up-events 
measurable with respect to disjoint sets of coordinates S and 5". Define a sequence of random variables 
Yi, Y 2 , Y 3 , Y 4 by letting Y\ be the indicator of A, letting I4 be the indicator of A', letting Y 2 = J2 e es ^e, 
and letting Y 3 = X) e es' ^e- Apply Lemma 2.9 to see that {Yi} is Markov. Lemma 2.5 finishes the 
argument once we know that Y 2 is stochastically increasing in Yi, Y3 is stochastically decreasing in Y 2 , 
and Y4 is stochastically increasing in Y 3 . Applying the last statement of Lemma 2.9 to the projection 
of zz* onto {0,1} 5 , we see that the conditional joint law of {X(e) : e e 5"} given J2 e es' ^( e ) = ^ 
increases stochastically in k, which says precisely that Y4 is stochastically increasing in Y3. The same 
argument with S in place of S' shows that Yi is stochastically increasing in Y 2 . By Proposition 1.2, 
Y 2 is stochastically increasing in Yi. Finally, to see that Y 3 is stochastically decreasing in Y 2l write the 
conditional distribution of Y 3 given {Y 2 = k} as an integral 



We have seen that v is stochastically increasing in k. By the hypothesis that zt* is JNRD, the integrand 
decreases stochastically when 77 increases in the natural partial order, and hence the integral stochastically 
decreases in k. This finishes the proof that JNRD implies CNA. 




where v is the mixing measure 



v 



•M = f{X{e) = 77(e) : e eS\ ^ X(e) = k). 
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It remains to show that h-NLC (respectively h-NLC+) implies JNRD (respectively JNRD+). The 
+ case will be shown in Section 3.2 below, in the proof of Theorem 3.1, so we prove here only that 
ULC implies JNRD for exchangeable measures. It suffices to show that the conditional distribution of 
J2 e ^f X{e) given X(f) = stochastically dominates the distribution of X) e ^/ -^( e ) gi ven X(f) = 1) since 
in the definition of JNRD, comparing the conditional probabilities of any two neighbors in the Boolean 
lattice {0, 1} A reduces to comparing conditional probabilities given one value X(f) after conditioning 
on all other values of X(g), g G A c , and such conditioning produces another exchangeable ULC measure. 
It further suffices to show that X) e ^/ is stochastically decreasing in Xf, since this is sufficient for the 
distribution of {X(e) :e^/} given X(f). 

Let {a,j} be the rank sequence for a ULC exchangeable measure /x, and let {qj} be the sequence 
{%/(")} as before. Then 

and 



Thus we need to show that for all k < n, 



<:ru ~ (";')* 



Z- M( x(/) = i) -Z. M x(/) = o)- 

Cross-multiply and replace the quantities ^i(X(f) = x) with the sum over s of fi(X(f) — x, J2 e ^f X(e) = 
s) to transform this into 

fn Af n 1 j gr+1(?s > ( U j( n ]q r q s +i- 

r<k;s<n-l V r / V S / r<k;s<n-l V r / V S / 

Canceling terms appearing on both sides reduces the range of the sum to r < k < s. But for r < s, 
log-concavity of {qj} implies that q r +iq s > 9r9s+i, which establishes the last inequality via term-by-term 
comparison and finishes the proof that ULC implies JNRD. □ 



3 Inductively defined classes of negatively dependent measures 

At this point it is worth examining the possibility that the many negative dependence properties in our 
desiderata are not mutually satisfiable. It is easy to see from the definition that the class of CNA+ 
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measures is closed under products, projections and external fields, so we have at least one existence 
result: 

Let So be the smallest class of measures containing all exchangeable ULC measures and 
which is closed under products, projections and external fields. Then S is contained in the 
class of CNA+ measures. □ 

Supposing there to exist a natural and useful class of "negatively dependent measures" , it is contained 
in the class of CNA+ measures, and certainly contains the class So. This section aims to improve the 
latter bound which seems, intuitively to be further from the mark. 

3.1 Further closure properties 

The class So is trivial, since products commute with external fields, and therefore So may be seen to 
contain only products of exchangeable ULC measures, on which have been imposed external fields. We 
may enlarge the class So either by including more measures in the base set or by increasing the number 
of closure operations in the inductive step. I will begin the discussion with a list of additional candidates 
for closure properties to those already listed in Section 2.1. 

7. Symmetrization. Given a measure /i on S n , let p! be the exchangeable measure with p'(^2j Xj = 
k) = p{J2j Xj = k). In other words, p! = (1/n!) J2ires n f 07r - Since the measure p! is exchangeable, we 
know criteria for p' to be negatively associated, and therefore closure under symmetrization boils down 
to the Conjecture 4 for the class of negatively dependent measures. 

8. Partial Symmetrization. One could strengthen the preceding closure property by allowing sym- 
metrization of only a subset of the coordinates, for example, one could take p! = (p + p o n)/2 where 
7r is a transposition. If one broadens this to taking p! = (1 — e)p + ep o n, then by iterating these 
with e — > 0, one obtains closure under an arbitrary time-inhomogeneous stirring operation. That is, let 
{ir t : t > 0} be a S„-valued stochastic Markov process, with transitions from ir to t o it at rates C(r, t) 
for each transposition t, where the functions C(r,t) are some arbitrary real functions. Fix T > and 
let p! = p o ttt- We require that our class of negatively dependent measures, if it contains p, to contain 
any such p! . 

One motivation for considering such a strong closure property is that we expect it to hold when p 
is a point mass, since then p! is the state of an exclusion process at a fixed time. It seems reasonable 
that if the initial state is random, chosen from a negatively dependent measure p, then the state at 
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time T should still be negatively dependent. Another plausibility argument is that going from ji to 
(1 — e)fi + efi o r is akin to sampling without replacement. It is shown in Joag-Dev and Proschan 
(1983, example 3.2 (a)) that the values of samples drawn without replacement from a fixed (real- valued) 
population are negatively associated. If the initial population is random with a negatively dependent 
law, this should still be true. 

9. Truncation. Given fi on B n , let // be fi conditioned on a < J2jXj < b. We say that ^ is 
the truncation of ji to [a, b]. We may ask that our class be closed under truncation. This seems the 
least controversial when a = b and we are conditioning on the sum 5^ . Xj. In fact, Block, Savits and 
Shaked (1982) define a collection of random variables {X\, . . . ,X n } to satisfy Condition N if there is 
some collection {Yi, . . . , Y^ + i} of random variables satisfying the positive lattice condition (2) and some 
number k such that the law of {X\, . . . , X n } is the law of {Yi, ... Y n } conditioned on Y^j=i Yj = 
They show that many examples of negatively dependent measures from Karlin and Rinott (1980) can be 
represented this way, and that this implies negative association. In fact, Joag-Dev and Proschan (1983, 
Theorem 2.6) show that if any random variables {X e : e e E} with law \x satisfy 

(H]Tx e = fc + i)h( M |]rx e = /c), (9) 

e e 

then (n | ^2 e X e = a) is negatively associated; a result of Efron (1965) is that (9) holds when the real- 
valued variables X e have densities that are log concave, which together with Joag-Dev and Proschan's 
result yields the Karlin and Rinott result. 

Conditioning on an entire interval [a, b] may seem less natural; it is a special case of the next closure 
operation. 

10. Rank rescaling. Given a measure /j, on B n and a log-concave sequence qo, . . . , q n , define the rank 
rescaling of \x by {qj} to be the measure /j,' given by 

= v W(a ° M . 

Here \y\ denotes the rank of y in B ni that is, the number of coordinates of y that are 1. When qj = 
l[ a> b](j), this reduces to truncation. Another special case is qj — r J , which is the same as imposing a 
uniform external field. Rank rescaling may be too strong a closure property to demand, so we give two 
plausibility arguments. Firstly, observe that rank rescaling commutes with external fields. Thus when 
fi is a product Bernoulli measure, the rank rescaling of fi by {qj} is just an exchangeable ULC measure 
plus an external field, which we know to be CNA+. Secondly, Theorem 3.1 below shows that the closure 
of So under rank rescaling is still contained in the class JNRD+. Unfortunately, since projections do 
not commute with rank rescaling, this class is not closed under projections, so we do not know whether 
adding rank rescaling to the list of closure operations results in measures that are negatively associated. 
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A concrete application in which we would like to have these closure properties is the random forest. 
Let G be a graph with n vertices and edge set E(G) and define the uniform random forest n : E(G) — > 
{0, 1} to be chosen uniformly among subsets of E{G) with no cycles. Thus we generalize the well studied 
spanning tree model by allowing more than one component. Peter Winkler (personal communication) 
asks whether any negative dependence can be shown for this model. Together with closure under 
truncation, this would imply negative correlations in constrained random forests, the simplest one of 
these being when 77 is chosen from acyclic edge sets with cardinality either n — 1 or n — 2. There seems 
to be no negative correlation result known even in this simple setting. 

3.2 Building a class of negatively dependent measures from the inside 

In this section we prove the following theorem, showing that asking for closure under rank rescaling is 
reasonable. 

Theorem 3.1 Let S be the smallest class of measures containing laws of single Bernoulli random vari- 
ables and closed under products, external fields and rank rescaling. Then every measure in S is JNRD+. 

The theorem is proved in several steps. 

Step 1: Represent each /1 in S by a tree. Observe that external fields commute with products and rank 
rescaling. Since an external field changes a Bernoulli variable into another Bernoulli, all measures in 
S are built from Bernoulli laws by products and rank rescaling. Let T be a finite rooted tree, with 
each leaf e labeled by a Bernoulli law v e , and each interior vertex v labeled by a log-concave sequence 
{lj V ^}i whose length is one more than the number of leaves below v. Associate a measure n v to each 
interior vertex v recursively, by letting ji v be the rank rescaling by {3^} of the product of the measures 
associated with the subtrees of v. Then the above observation implies that every measure in S is the 
measure associated with the root of such a tree T, so that if the measure is the law of {X(e) : e E S} 
then the set of leaves of T is precisely S. We may assume without loss of generality that every interior 
vertex of T has precisely two children. We also note that log-concavity is closed under convolution and 
pointwise products, and thus by an easy induction the rank sequence for every measure fj, v associated 
with any vertex v of such a tree is log-concave. 

Step 2: Use Lemma 2.5. For any vertex v of T, define Y v to be the sum of X e over all leaves e lying below 
v (the root is at the top) . Suppose e and / are two leaves of T and let v be their meeting vertex, that 
is, the lowest vertex of T having both e and / as descendants. Let e = eo, ei, . . . , efe, v, /;,•••, fo = f be 
the geodesic connecting e and / in T. I claim that the sequence {Y eo , . . . , Y ek , Yf, , . . . , Yj } is Markov, 
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and that each is stochastically increasing in the previous one, except that Yf t is stochastically decreasing 
in Y ek . The conclusion of this step, which follows immediately from Lemma 2.5 once the claims are 
established, is that X e and Xf are negatively correlated. 

Establishing the Markov property is a diagram chase. Use the notation g > v to denote that the 
leaf g is a descendant of the vertex v. Slightly stronger than the Markov property is the fact that the 
collection {X g : g > ej-i} and the collection {X g : g\ ej} are independent given Y ej . To see that this 
independence property holds, write 

^X g =x g :g€S) = Cl[u g (x g ) J] - 

g£E v interior 

where y v :— J2 g >v x g- Now observe that the only factors in the product depending both on values x g 
for g > ej-\ and for ej depend only on the total y ej , giving us the desired conditional independence. 

Step 3: Verify the part of the claim involving stochastic dependence. We first record a simple lemma. 



Lemma 3.2 Let {a n }, {b n }, {c n } be finite sequences of nonnegative real numbers, with a,ibjC i+ j not 
identically zero. Let X and Y be random variables such that 

P{X =i,Y = j) = K ai bjc t+J (10) 

for some normalizing constant, K. Then 

(i) X 1 (X + Y) if b is log-concave; Y | (X + Y) if a is log-concave; 

(ii) (X + Y)]Xifbis log-concave; (X + Y) | Y if a is log-concave; 
(Hi) X IY if c is log-concave; Y [ X if c is log-concave; 

Proof: By symmetry it suffices to prove the first half of each statement. We use the fact that if \x and 
v are probability measures on the integers with /j,(x)/v(x) increasing in x, then /i ^ v. 

For statement (i), let /i be the conditional distribution of X given X + Y = j, and let v be the 
conditional distribution of X given X + Y = j + 1 (we deal only with the interval of values of j for 
which we are conditioning on events of positive probability). Then ji(x) = Ca x bj- X cj for some constant 
C, while v(x) = C'a x bj + i- x Cj + \ for some C. Hence /j,(x)/v(x) — C"bj- x /bj + i- x , which is decreasing 
in x as long as {bj} is log-concave. Statements (ii) and (Hi) are proved similarly. For (Hi), let \i be the 
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conditional distribution of X given Y — j and v be the conditional distribution of X given Y = j + 1. 
Then /j,(x)/u(x) = Ccj+ x /cj+ x +i, which is increasing in x if {cj} is log-concave. And for (ii), let /x be 
the conditional distribution of X + Y given X — j and v be the conditional distribution of X + Y given 
X = j + 1. Then n(x)/v(x) = Cb x -j /b x -j-i, which decreases in x when {bj} is log-concave. □ 

The stochastic increases in the sequence {Y eo , . . . , Y ek ,Yf n . . . , Yf } are now easy to verify. Let w be 
the child of e, + i that is not ej, let X = Y e ., and let Y = Y W . Recall from the recursive construction 
of the measures that /j, ej gives X a log-concave sequence of probabilities, call it {a^}, that n w gives Y 
a log-concave sequence of probabilities, call it {bi], and that gives probabilities as in (10) with 

d = <7j 6j+1 . Replacing fi ej+1 by the measure /x associated with the root of the tree effectively alters 
the sequence {cj} but not {a^} or {h}. Since the sequences {ai\ and {bi} are log-concave, parts (i) 
and (ii) of the previous lemma imply that X is stochastically increasing in X + Y and vice versa. Since 
X + Y = Y ej+1 , and since the argument works equally well for fj instead of ej, this gives all parts of the 
claim except the fact that Yf, j Y ek . 

Let v be the common parent of and /;. As before, we see that under the law /j, v , Yf t is stochasti- 
cally decreasing in Y ek , according to statement (Hi) of the lemma with c, = qf^ which is log-concave. 
Transferring this argument to the measure fj, is mostly a matter of using the right notation to make it 
clear that the new sequence {cj} is log-concave. Let v = v 0} v\, . . . , v r be the path leading from v to 
the root, and for 1 < i < r, let Wi be the child of Vi not equal to Vi-\. Let at = fj, ek (Y ek — i) and 
bi = ^ l (Yf l = i). Let s\ = qf 3 " 1 and let t\ = [i Wj (Y Wj = i). Use the recursive definition of the measures 
/i 9 to see that 

fi(Y ek = i, Y fl = j) 

r 

Ul,...,U r j—1 

The summation term may be written as 

((• • • ((s r * V) s"- 1 ) * F=T. • • s 1 ) * t 1 ), (11) 

where * denotes convolution, denotes pointwise product, — denotes reversal, and and P denote 
the sequences {sj} and {£■}■ Since convolution, pointwise product and reversal preserve log-concavity, 
this shows that the third part of Lemma 3.2 still applies, and finishes the verification. 

Step 4: Negative correlation implies h-NLC+. Observe that the property h-NLC+ is the same as NC+, 
where NC denotes pairwise negative correlation. To see this, note that an external field with W(e) — > 
or oo corresponds to conditioning on X e = or 1 respectively. Thus NC+ is equivalent to negative 
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correlation of any pair of variables, given values of any others, under any external field, which is h- 
NLC+. The conclusion of steps 2 and 3 were the NC property, and hence NC+, since the class is 
already closed under external fields. 

Step 5: Modifying the argument to get JNRD+. Let e be a leaf of T and let Vo,v\, . . . ,Vk be the path 
from e to the root, with vo = e. Let Wi be the child of Vi other than I claim that the vector 

(Y Wl , . . . ,Y Wk ) is stochastically decreasing in X e . This is shown by coupling, inducting on i. We will 
define a sequence (Yi, . . . , Yfe) to have the conditional distribution of (Y Wl , . . . , Y Wk ) given X e = and 
(Y/, . . . , Y£) to have the conditional distribution of (Y Wl , . . . , Y Wk ) given X e = 1 so that (Yi — Y{, . . . , Yfe — 
YD has all coordinates zero except possibly for a single 1. 

When i = 1, we have Y Wl J. X e by part (iii) of Lemma 3.2, using log-concavity of a sequence 
analogous to (11). Since also Y W1 + X e | X e by part (i) of the lemma and log-concavity of the rank 
sequence for Y W1 , this means we can define Yi and Y[ so that Y\ has the distribution of Y W1 given 
X e = 0, Y/ has the distribution of Y Wl given X e = 1 and Y/ + 1 > Y x > Y/. If Yi = Y/ + 1, then choose 
(Y 2 , . . . , Yc) to have the conditional distribution of {Y W2 , . . . , Y Wk ) given X e — and Y^ = Yi. This is 
the same as the conditional distribution of (Y W2 , . . . , Y Wk ) given X e — 1 and Y^ = Y/, so we may choose 
(Y 2 ', . . . , Y fe ') = (Y 2 , . . . , Ye). If Yi = Y{, then choose Y 2 and Y 2 from the conditional distribution for Y W2 
given respectively that Y Vl = Y\ + 1 and Yi. Again Y 2 + 1 > Y 2 > Y 2 , and we continue, setting the 
remaining coordinates equal if Y 2 = Y 2 + 1, and otherwise choosing Y 3 and Y 3 ' and so on. 

The collections {Xf : f > Wi} are conditionally independent as i varies given {Y Wi : 1 < i < k}. Thus 
we may write the conditional law of {Xf : f ^ e} given X e = as a mixture over values (n, . . . , ru) of 
(Yi, . . . , Yfe) of product measures Mj,^ where /ij,r-j is the conditional law of {X e : e > Wj} given 

Yu,^ = rj. The conditional law of {Xf : f ^ e} given X e = 1 is the same, but with a stochastically 
smaller mixing measure. Suppose the laws /ij irj are stochastically increasing in rj. Then by stochastic 
comparison of the mixing measures, we see that the conditional law of {Xf : f ^ e} given X e = 
dominates the conditional law of {Xf : f ^ e} given X e = 1. The measures /ij^. are in the class S (S is 
not closed under projection but projections onto all variables in a subtree is OK). Thus all that remains 
to verify JNRD+ is to prove the supposition, which is the following lemma. 

Lemma 3.3 For any measure fi in the class S, the conditional distribution of \i given J2 e X e = k + 1 
stochastically dominates the conditional distribution given J2 e — k. 

To prove this we strengthen Lemma 3.2 a little. Recall that an element of a partially ordered set 
covers another if it is greater and there is no element in between. Say that a measure fx on a partially 
ordered set covers the measure v if there are random variables X ~ \i and Y ~ v such that X = Y or 
X covers Y. 
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Lemma 3.4 Under the hypotheses of Lemma 3.2, if {a n } is log-concave, then {X \ X + Y = k + 1) covers 
(X | X + Y = k) and if {c„} is log-concave then (X + Y \ X = k + 1) covers (X + Y \ X = k). 

Proof: The likelihood ratio of the law of X conditioned on X + Y = k+1 to the law of X + l conditioned 
on X+Y = k, evaluated at the point x, is equal to a x b k+ i- x c k +i / ' {a x -ib k+1 - x c k ) = {c k+1 /c k )(a x /a x ^ 1 ). 
This is decreasing in x by log-concavity of {a n } . The likelihood ratio of the law ol X + Y given X = k + 1 
to the law of X + Y + 1 given X = k, evaluated at the point z, is a k+1 c zi '( a k c z-i) which is decreasing 
in z by log-concavity of {c„}. □. 

Proof of Lemma 3.3: Induct on the height of the tree T. If T is a single leaf, then the statement is 
trivial. Now suppose the root of T has children v and w and assume for induction that the lemma holds 
for [i v and [i w . Since the rank sequences for Y v and Y w are log-concave, part (i) of Lemma 3.2 show 
that Y v and Y w are each stochastically increasing in Y v +Y w . By Lemma 3.4, in fact the law of Y v given 
Y v +Y W = k + 1 covers the law of Y v given Y v + Y w = k, from which we conclude that the pair (Y v , Y w ) 
is stochastically increasing in Y v + Y w . By the inductive hypothesis, {X e : e > v} is stochastically 
increasing in Y v and the same is true with v replaced by w. Since {X e : e > v} and {X e : e > w} are 
conditionally independent given Y v and Y w , this finishes the proof. □ 



3.3 Further observations and conjectures 

Lemma 3.3 seems to be true in the following greater generality. 

Conjecture 8 If fi is CNA+ then the conditional distribution [i given ^2 e X e = k + 1 stochastically 
dominates the conditional distribution /i given ^ e X e = k. 

Remark: The conclusion of this conjecture appears in Joag-Dev and Proschan (1983) as a hypothesis 
implying negative association. Does this condition fit into the theory of negative dependence better as 
a hypothesis or a conclusion? The same could be asked about the ULC condition, c.f Conjectures 4-6. 

Another conjecture that seems to be true is as follows. 

Conjecture 9 If fi on B n is CNA+ then the conditional distribution on B n -\ given X n = stochasti- 
cally covers the conditional distribution given X n = 1. 

These conjectures may be strengthened by weakening the hypothesis to JNRD+ or h-NLC+, but the 
+ condition is essential, at least for the second conjecture, as shown by the following example. 
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Example: Let \x be the measure on B3 with equal probabilities 1/5 for the points (0, 0, 0), (0, 0, 1), 
(0, 1,0), (1,0,0) and (1, 1,0). This is CNA but not h-NLC+ (impose an external field with W(l) very 
small). The measure (fi \ X3 = 0) is stochastically greater than the measure (fj, \ X3 = 1) but is too much 
greater to cover it. 

Question 10 Under what hypotheses on \i can one prove that 



An answer to this question would be important for the following reason. Let A be any upset. If we 
can establish (12), then A f X e and in particular these have nonnegative covariance. Therefore A 
and X e have nonnegative covariance for some e and we have established proprty (ii) of Section 1.5. In 
particular, Conjecture 8 implies Conjecture 2. 

Acknowledgements: Most of the blame for this goes to Peter Doyle for egging me on in the early 
going and for proving Theorem 3.1 with me. Thanks to Peter Shor for suggestions pertaining to the urn 
model. Thanks to Yosi Rinott for some helpful discussions on a previous draft of this paper. 
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